1 About

We eventually aim to run functional-annotation-informative analysis on fine mapping, and the tool suitable for this aim is PAINTOR. However, we first need to ensure that a basic analysis (no functional annotation) is reliable and results are consistent: we have several tools available, including PAINTOR, FINEMAP and CAVIAR. All the three tools belong to the class of fine-mapping methods that take GWAS summary stats, but require an estimate of LD pattern from a reference panel.

We selected FINEMAP for the basic analysis (no functional annotation), as its statistical model is similar to PAINTOR, but FINEMAP allows more controls and its output is more verbose (the number of causals and priors are explicitly specified; see also this discussion. Moreover, the FINEMAP authors recently published a review paper on running fine-mapping analysis on large-scale datasets (Benner et al. 2017). The 3rd and 4th paragraphs in Discussion section are particularly relevant for results reported here below.

We also showed that the size of the reference panel must scale with the GWAS sample size. Although a panel of 1,000 samples is adequate for a GWAS sample size of 10,000, a panel of 10,000 samples is needed for a GWAS sample size of 50,000. This result has important consequences for ongoing large meta-analysis efforts and biobank studies. We confirmed the result in three ways: empirically through simulations, analytically through likelihood evaluations, and theoretically through mathematical derivation.

In our analyses, we used FINEMAP software, which is based on a stochastic search algorithm. We verified that the results of FINEMAP were consistent across separate runs when the LD information provided a good approximation of the LD information from the original genotype data. We also observed that inaccurate LD information or mismatches in the allele coding between the reference panel and GWAS data could lead to an inflation of false positives and also to an inconsistency between the FINEMAP results across separate runs. Such problems typically manifest when the posterior probability of the number of causal variants concentrates on the maximum value possible and can therefore be detected by comparison of several FINEMAP runs that allow for increasing numbers of causal variants.

1.1 Current results

Analysis set up:

  • SNPs with MAF <1% are excluded
  • LD is estimated from 1,000 Genomes reference panel: super-population EUR of ~500 individuals
  • We run FINEMAP (the maximum number of causal SNPs = 3)
    • The window size is 1Mb
    • We controlled for mismatches in reference allele coding by taking absolute values in both LD and Z-score data (that makes us think that all inconsistencies come from the LD mismatches).

1.2 Next steps to overcome caveats

  • Decide how to filter SNPs by MAF (minimum cohort-specific MAC > 10 ?)
  • Find a way to estimate LD
    • Use raw genotype data in one of the cohorts (Harvard, 23me, Kaiser, Ohio, Rotterdam, deCode)
    • Use UK Biobank 300K individuals of British ancestry
  • Run FINEMAP with 10 causals (recommended (Benner et al. 2017)) in and check the diagnostic plot (the posteriors on the number of causals)

There are also alternative fine-mapping methods that don’t require LD information:

  • See a recent article (Mahajan et al. 2018)

2 Analysis set up

2.1 Top loci

Locus MarkerName Chr Pos cytoband gene.context Major Minor MAF Effect StdErr
1 rs10399947 1 150,861,960 1q21.3 ARNT–[]–SETDB1 G A 0.369 -0.06 0.01
2 rs10200279 2 202,170,655 2q33.1 [ALS2CR12] C T 0.287 0.07 0.01
3 rs192481803 2 35,336,564 2p22.3 [] C T 0.007 0.65 0.12
4 rs62246017 3 71,483,084 3p13 FOXP1—[]—EIF4E3 G A 0.325 0.07 0.01
5 rs6791479 3 189,205,032 3q28 TPRG1—[]—TP63 A T 0.427 0.09 0.01
6 rs35407 5 33,946,571 5p13.2 [SLC45A2] G A 0.042 -0.47 0.04
7 rs4455710 6 32,608,858 6p21.32 [HLA-DQA1] C T 0.368 0.14 0.02
8 rs12203592 6 396,321 6p25.3 [IRF4] C T 0.166 0.44 0.01
9 rs10944479 6 90,880,393 6q15 [BACH2] G A 0.189 -0.09 0.02
10 rs117132860 7 17,134,708 7p21.1 AGR3—[]—AHR G A 0.021 0.25 0.04
11 rs7834300 8 116,611,632 8q23.3 [TRPS1] C G 0.438 0.07 0.01
12 rs1325118 9 12,619,616 9p23 []–TYRP1 T C 0.304 -0.07 0.01
13 rs10810657 9 16,884,586 9p22.2 BNC2–[]—CNTLN A T 0.404 -0.10 0.01
14 rs57994353 9 139,356,987 9q34.3 [SEC16A] T C 0.284 0.09 0.01
15 rs1126809 11 89,017,961 11q14.3 [TYR] G A 0.279 0.15 0.01
16 rs74899442 11 115,890,279 11q23.3 CADM1—[]—BUD13 T C 0.004 0.60 0.11
17 rs7939541 11 9,590,389 11p15.4 ZNF143–[]-WEE1 T C 0.410 0.08 0.01
18 rs657187 12 52,898,985 12q13.13 KRT6A–[]-KRT5 A G 0.420 -0.07 0.01
19 rs721199 12 96,374,057 12q23.1 [HAL] C T 0.463 -0.06 0.01
20 rs1800407 15 28,230,318 15q13.1 [OCA2] C T 0.070 0.16 0.02
21 rs1805007 16 89,986,117 16q24.3 TCF25-[]-TUBB3 C T 0.078 0.38 0.02
22 rs6059655 20 32,665,748 20q11.22 [RALY] G A 0.077 0.25 0.02

All column names: MarkerName, Chr, Pos, cytoband, gene.context, Major, Minor, MAF, Effect, StdErr, P.value, HetISq, HetChiSq, HetDf, HetPVal.

2.2 Filter by MAF 1%

Group The number of variats
Total 24,707,509
MAF > 1% 10,792,565
MAF <= 1% 13,914,944

The range of MAF in the original summary stats file was from 0 to 1. How was MAF computed, using all or a subset of cohorts, and then used in GWAS? Is cohort-specific filtering by MAC better? How to filter by MAF/MAF in fine-mapping?

3 Select a window size

Plot description:

Notes:

4 Fine-mapping analysis by FINEMAP (3 causals) for first 6 loci

Plot description:

Notes:

4.1 Locus1 (failed diagnostics)

The indication of failed fine-mapping: the maximum posterior prob. is with the maximum number of causals. Also, the top SNPs based on Posterior Probability of SNP to be causal (rank_pp column) are far away from the top SNPs based on Z-scores (rank_z column).

 - tables of results: `config`, `snp`, `ncausal`
 - locus: 1 
  -- config:
  -- input snps: 2370 fine-mapped + 310 missing Z/LD = 2680 in total
# A tibble: 10 x 4
   rank config                           config_prob config_log10bf
  <int> <chr>                                  <dbl>          <dbl>
1     1 rs10399947,rs12090215,rs78278355       0.167           23.6
2     2 rs1134067,rs12090215,rs78278355        0.130           23.5
3     3 rs6686064,rs12090215,rs78278355        0.113           23.5
# ... with 7 more rows
  -- snp:
# A tibble: 2,680 x 6
  snp        rank_z rank_pp snp_prob snp_prob_cumsum snp_log10bf
  <chr>       <int>   <int>    <dbl>           <dbl>       <dbl>
1 rs12090215     19       1    1.00            0.333       13.2 
2 rs78278355   1446       2    1.00            0.667       13.2 
3 rs10399947      1       3    0.167           0.722        2.50
# ... with 2,677 more rows
  -- 9 snps in 95% credible set: rs12090215, rs78278355, rs10399947, rs1134067, rs6686064, rs11204733, rs4970928, rs6660845, rs11587444...

4.2 Locus3 (ok diagnostics)

 - tables of results: `config`, `snp`, `ncausal`
 - locus: 1 
  -- config:
  -- input snps: 4148 fine-mapped + 406 missing Z/LD = 4554 in total
# A tibble: 10 x 4
   rank config                config_prob config_log10bf
  <int> <chr>                       <dbl>          <dbl>
1     1 rs72242061                  0.203           3.92
2     2 rs72242061,rs61249550       0.191           7.51
3     3 rs60100018                  0.121           3.69
# ... with 7 more rows
  -- snp:
# A tibble: 4,554 x 6
  snp        rank_z rank_pp snp_prob snp_prob_cumsum snp_log10bf
  <chr>       <int>   <int>    <dbl>           <dbl>       <dbl>
1 rs61249550      3       1    0.475           0.347        3.40
2 rs72242061      1       2    0.395           0.635        3.26
3 rs60100018      2       3    0.184           0.769        2.79
# ... with 4,551 more rows
  -- 4 snps in 95% credible set: rs61249550, rs72242061, rs60100018, rs144368575...

5 Regions for all 22 loci

6 Previous reports

Barely, only numbers, tables and figures.

References

Benner, Christian, Aki S Havulinna, Marjo-Riitta Järvelin, Veikko Salomaa, Samuli Ripatti, and Matti Pirinen. 2017. “Prospects of Fine-Mapping Trait-Associated Genomic Regions by Using Summary Statistics from Genome-Wide Association Studies.” The American Journal of Human Genetics 101 (4). Elsevier:539–51.

Mahajan, Anubha, Daniel Taliun, Matthias Thurner, Neil R Robertson, Jason M Torres, N William Rayner, Valgerdur Steinthorsdottir, et al. 2018. “Fine-Mapping of an Expanded Set of Type 2 Diabetes Loci to Single-Variant Resolution Using High-Density Imputation and Islet-Specific Epigenome Maps.” bioRxiv. Cold Spring Harbor Laboratory, 245506.